Hybrid huberized support vector machines for microarray classification and gene selection
نویسندگان
چکیده
MOTIVATION The standard L(2)-norm support vector machine (SVM) is a widely used tool for microarray classification. Previous studies have demonstrated its superior performance in terms of classification accuracy. However, a major limitation of the SVM is that it cannot automatically select relevant genes for the classification. The L(1)-norm SVM is a variant of the standard L(2)-norm SVM, that constrains the L(1)-norm of the fitted coefficients. Due to the singularity of the L(1)-norm, the L(1)-norm SVM has the property of automatically selecting relevant genes. On the other hand, the L(1)-norm SVM has two drawbacks: (1) the number of selected genes is upper bounded by the size of the training data; (2) when there are several highly correlated genes, the L(1)-norm SVM tends to pick only a few of them, and remove the rest. RESULTS We propose a hybrid huberized support vector machine (HHSVM). The HHSVM combines the huberized hinge loss function and the elastic-net penalty. By doing so, the HHSVM performs automatic gene selection in a way similar to the L(1)-norm SVM. In addition, the HHSVM encourages highly correlated genes to be selected (or removed) together. We also develop an efficient algorithm to compute the entire solution path of the HHSVM. Numerical results indicate that the HHSVM tends to provide better variable selection results than the L(1)-norm SVM, especially when variables are highly correlated. AVAILABILITY R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/.
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملIdentification of Alzheimer disease-relevant genes using a novel hybrid method
Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant gen...
متن کاملA Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data
The hybrid Huberized support vector machine (HHSVM) with the elastic-net penalty has been developed for cancer tumor classification based on thousands of gene expression measurements. In this paper, we develop a Bayesian formulation of the hybrid Huberized support vector machine for binary classification. For the coefficients of linear classification boundary, we propose a new type of prior, wh...
متن کاملDepartment of Statistics University of Missouri - Columbia TR - MU - STAT - 2009 - 09 - 07
Support vector machine (SVM) has been successfully applied for cancer tumor classification based on thousands of gene expression measurements. A modification of SVM known as hybrid Huberized support vector machine (HHSVM) has been developed for the same purpose along with an in built gene selection mechanism with the help of elastic-net penalty. In this paper we develop a Bayesian formulation o...
متن کاملFault Detection and Classification in Double-Circuit Transmission Line in Presence of TCSC Using Hybrid Intelligent Method
In this paper, an effective method for fault detection and classification in a double-circuit transmission line compensated with TCSC is proposed. The mutual coupling of parallel transmission lines and presence of TCSC affect the frequency content of the input signal of a distance relay and hence fault detection and fault classification face some challenges. One of the most effective methods fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 24 3 شماره
صفحات -
تاریخ انتشار 2008